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ABSTRACT 

Next-generation sequencing has great potential for 
application in bacterial transcriptomics. However, 
unlike eukaryotes, bacteria have no clear mechan- 
ism to select mRNAs over rRNAs; therefore, rRNA 
removal is a critical step in sequencing-based 
transcriptomics. Duplex-specific nuclease (DSN) is 
an enzyme that, at high temperatures, degrades 
duplex DNA in preference to single-stranded DNA. 
DSN treatment has been successfully used to nor- 
malize the relative transcript abundance in mRNA- 
enriched cDNA libraries from eukaryotic organisms. 
In this study, we demonstrate the utility of this 
method to remove rRNA from prokaryotic total 
RNA. We evaluated the efficacy of DSN to remove 
rRNA by comparing it with the conventional sub- 
tractive hybridization (Hyb) method. Illumina deep 
sequencing was performed to obtain transcrip- 
tomes from Escherichia coli grown under four 
growth conditions. The results clearly showed that 
our DSN treatment was more efficient at removing 
rRNA than the Hyb method was, while preserving 
the original relative abundance of mRNA species in 
bacterial cells. Therefore, we propose that, for bac- 
terial mRNA-seq experiments, DSN treatment 
should be preferred to Hyb-based methods. 

INTRODUCTION 

RNA-seq is a novel method for elucidating the transcrip- 
tome of cells. This method uses high-throughput next- 
generation sequencing technology and has revolutionized 
the way in which gene expression profiles are examined 
(1). The RNA molecules present in prokaryotic cells are 
mostly rRNA species, whereas mRNA constitutes only 
1-5% of total RNA. Therefore, efficient enrichment of 



mRNA is a critical step for successful mRNA-seq experi- 
ments. Because mRNA molecules extracted from bacterial 
cells mostly lack poly-A tails, the methods developed so 
far have focused on removing non-mRNAs rather than 
selecting mRNAs. Several techniques (2-6) have been 
applied to deplete rRNA from the total bacterial RNA 
population, but the efficiency and robustness of these 
methods have not been objectively compared. Recently, 
He et al. (7) compared the two most popular rRNA 
removal methods, namely subtractive hybridization 
(Hyb) and exonuclease digestion, using Illumina-based 
RNA-seq of synthetic microbial metatranscriptomes. 
Their results suggested that the Hyb method introduced 
less bias in the relative proportion of the mRNA popula- 
tion compared to exonuclease digestion. However, no 
study has been conducted to evaluate these rRNA 
removal methods in mRNA-seq based on pure cultured 
strains. 

Zhulidov et al. (8) introduced a simple cDNA normal- 
ization method based on duplex-specific nuclease (DSN) 
aimed at enhancing the detection of rare transcripts in 
eukaryotic cDNA libraries by decreasing the prevalence 
of highly abundant transcripts. This DSN method 
includes the denaturation of cDNA, its subsequent 
reassociation and enzymatic degradation of the 
double-stranded (ds) DNA fraction using DSN isolated 
from the Kamchatka crab (9). Because the Hyb rate for 
each transcript is proportional to the square of its concen- 
tration (10), abundant transcripts form ds DNA more ef- 
fectively during the reassociation step and are subjected to 
DSN-mediated degradation. DSN has a strong preference 
for cleaving dsDNA, and there is no significant cleavage of 
single-stranded (ss) DNA under the directed working con- 
ditions of the enzyme (9). 

The DSN method has been successfully applied to nor- 
malize transcripts in cDNA libraries from various eukary- 
otes (1 1). The treatment was usually performed on cDNA 
libraries enriched with mRNAs using either mRNA- 
specific poly(A) tail selection in the RNA state (12) or 
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an oligo(dT) primer approach for reverse transcription 
from total RNA (13,14). However, the application of the 
DSN method for the purpose of rRNA removal from total 
RNA has not been reported in prokaryotic or eukaryotic 
transcriptome studies. Here, the use of DSN normaliza- 
tion as an rRNA removal method was evaluated and 
compared to the conventional subtractive Hyb method. 
Illumina deep sequencing of the transcriptomes of 
Escherichia coli grown under four conditions 
demonstrated that the DSN method is suitable for 
rRNA removal while preserving the original relative abun- 
dance of each mRNA transcript. 

MATERIALS AND METHODS 

Bacterial cultures 

Escherichia coli K-12/MG1655 was grown in LB medium 
(Difco) at 37°C with continuous shaking. The freshly 
grown cells were inoculated into two culture flasks con- 
taining LB medium and incubated under anaerobic or 
aerobic conditions. At exponential phase (OD 0 . 6 ), the 
aerobic culture was evenly distributed into three sterile 
culture flasks under aseptic conditions. Three aliquots 
were then subjected to three different conditions as 
follows. The first aliquot was subjected to instant RNA 
extraction at exponential phase, and the second was ex- 
tracted at stationary phase (OD 2 . 0 ). The third aliquot was 
subjected to heat shock stress by incubating the culture at 
42°C for 30min. The anaerobic cells were also grown to 
exponential phase (OD 0 6 ), and the cells were subjected to 
instant RNA extraction. 

RNA extraction, rRNA removal and sequencing library 
construction 

Total RNA was independently extracted from the four 
culture conditions (aerobic exponential, aerobic station- 
ary, aerobic heat shock and anaerobic exponential) using 
the hot phenol method with additional purification using 
an RNeasy Mini kit (Qiagen) following the manufactur- 
er's instructions. The quantity and quality of the RNA 
were evaluated before and after the rRNA removal 
processes using RNA electropherograms (Agilent 2100 
Bioanalyzer) and the RNA integrity number (RIN) (15). 
Total RNAs from cultures treated under the four condi- 
tions were aliquoted into three portions. The first aliquot 
of total RNA (200 ng) was used to generate a sequencing 
library using an mRNA-seq library prep kit (Illumina) 
without any other treatment. The RNA was directly sub- 
jected to fragmentation without the mRNA purification 
step (poly-A selection). The second aliquot was subjected 
to a subtractive Hyb-based rRNA removal process using 
the MICROBExpress Bacterial mRNA Enrichment Kit 
(Ambion). The resultant RNA (100 ng) was used for 
sequencing library construction using the mRNA-seq 
library prep kit, omitting the poly-A selection step. The 
last aliquot (200 ng) was used to generate a sequencing 
library using an mRNA-seq library prep kit with some 
modifications. The RNA was directly subjected to frag- 
mentation without the mRNA purification step (poly-A 
selection). The first- and second-strand cDNA was 



synthesized from the fragmented RNA using random 
hexamer primers. End repair, A-tailing, adaptor ligation, 
cDNA template purification and enrichment of the 
purified cDNA templates using PCR were then per- 
formed. The resulting sample libraries were subjected to 
DSN treatment using the Trimmer-Direct cDNA 
Normalization Kit (Evrogen) as follows. The sample 
library mixed with Hyb buffer was denatured at 98°C 
for 2min and incubated at 68°C for 5h. DSN buffer 
and 2ul of the DSN enzyme were added to the mixture 
and incubated at 68° C for 25min followed by the addition 
of stop solution. After purification of the DSN-treated 
library using SPRI beads, the library was enriched by 
PCR using PE1.0 and PE2.0 primers. The library con- 
struction was completed by final purification of the PCR 
product using SPRI beads. Because the commercial Hyb 
and DSN kits already employed highly optimized re- 
agents and conditions for E. coli RNA, we adopted the 
procedures suggested by the respective manufacturers. 
The summary of the experimental procedures of the 
control, DSN and Hyb methods is given in 
Supplementary Figure SI. 

Sequencing and alignment of the transcriptome 

RNA deep sequencing was performed using two runs of 
the Illumina Genome Analyzer IIx to generate 
single-ended 36-bp reads. The genome sequence and func- 
tional annotation information of this strain were obtained 
from the NCBI database (accession number 
NC_000913.2). Quality-filtered reads were aligned to the 
reference genome sequence using CLC Genomics 
Workbench 4.0 (CLC bio). Mapping was based on the 
minimal length of 32 bp with an allowance of up to two 
mismatches. The relative transcript abundance was 
measured in reads per kilobase of exon per million 
mapped sequence reads (16) (RPKM) using the following 
formula: 

10 9 x (number of mapped 
reads of an mRNA) 

RPKM = 

(total number of mapped reads in a sample) 
x(sum of the exons in base pairs) 

Determination of detection threshold 

The library-size normalization was performed by dividing 
the raw read count of each mRNA by the number of total 
mapped reads in each Illumina lane and then multiplying 
by the average total mapped read numbers of four control 
samples. The detection threshold was determined by 
calculating the number of reads of an mRNA that was 
significantly different from zero read considering the ex- 
perimental errors. For this calculation, duplicate control 
RNA data were analyzed under the assumption that 
controls 1 and 2 should have an identical true expression 
level, but measurement errors may cause different 
observed expression levels. The sample standard deviation 
was used to calculate the confidence interval that distin- 
guished the observed expression level of an mRNA from 
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an undetected mRNA. The detailed calculations are as 
follows. 

It was assumed that Xa(xa) was an observed expression 
level for control 1 (control 2) and that Uy was an unknown 
true expression level for both x n and x i2 . If x n was 0, it 
was removed from the analysis, and it was assumed that 
Xn was larger than 0. For the measurement error, the fol- 
lowing log-normal distributions for x a and x i2 were 
assumed, respectively: 

log 10 x n ~N(jiu<rfy 
logio*2 ~#U*i.o£ j ). 

where x n and xq are independent. The variance in x n and 
Xa depends on li, because the proportion of the measure- 
ment error got smaller for higher (i„ and results in 
Supplementary Figure S3a also confirmed that the 
inverse of their variance was proportional to the true ex- 
pression level. Because the mean parameters for xn and x i2 
were assumed to be equal, the following could be 
obtained: 

logio x a - logio x a ~ ^(o,o^) =^ -^log l0 ^ ~ N { Q ^l) ■ 

As a result, the confidence interval for x n given \i t could 
be empirically calculated using -^log 10 To minimize the 
dependence of the variance on ix;, only the observed 
intensities for which ^log 10 xaxa was <1 were con- 
sidered. If the variance was denoted as a\ x , the normality 
of the observed expression level resulted in the confidence 

interval {—a, 3. at a 0.001 significance level. 
Statistical analyses 

Statistical analyses were performed using the RPKM 
values of mRNAs detected from all experimental condi- 
tions after detection threshold filtering using library 
size-normalized data. The statistical significance 
(T-value) of differences in rRNA removal efficiency was 
obtained using the likelihood ratio test based on a 
generalized linear model based on Poisson regression. 
The robustness of mRNA relative abundance conserva- 
tion was analyzed using general linear regression and 
Lowess nonlinear regression models. Hierarchical cluster- 
ing was performed using the unweighted pair group 
method with arithmetic mean (UPGMA) clustering 
algorithm using Pearson's product-moment correlation 
coefficient. All statistical analyses were performed using 
the R package, version 2.11.0 (www.r-project.org). 

RESULTS 

Illumina deep sequencing 

Four RNA samples prepared from E. coli K12/MG1655 
cells grown under four conditions (aerobic exponential, 
aerobic stationary, aerobic heat shock and anaerobic ex- 
ponential) were aliquoted and subjected to three different 
protocols. The first aliquot (control) was processed 



without any rRNA removal treatment; the second 
aliquot was treated using DSN normalization (DSN; 
Trimmer-Direct cDNA Normalization Kit, Evrogen); 
and the third aliquot was treated using subtractive Hyb 
(MICROBExpress Bacterial mRNA Enrichment kit, 
Ambion). The RNA quality measured using RNA 
electropherograms showed that the extracted total RNA 
was of good quality, with an average RNA integrity 
number (RIN) of 9.2 (Supplementary Figure S2). The dis- 
appearance of rRNA peaks after the rRNA removal 
process using Hyb was also visualized in the 
electropherograms . 

DNA sequencing was performed using two eight-lane 
flow cells of the Illumina Genome Analyzer IIx to generate 
single-ended 36-bp reads. The first run contained eight 
lanes of untreated control samples, which consisted of du- 
plicate lanes of each condition to allow an increased 
number of mRNA reads for samples. The second run con- 
tained four lanes of DSN-treated samples and four 
Hyb-treated samples. The number of quality-filtered 
reads for each sample ranged from 32 to 39 million, and 
>99.0% of the reads (average 99.4%) were mapped to the 
reference genome sequence, indicating good sequencing 
quality and negligible contamination (Table 1). The 
Illumina reads of duplicate control sample lanes were 
combined for further analyses. 

The sequence coverage is defined as the proportion of 
mRNAs that have one or more mapped reads with respect 
to all annotated genes (4493 genes in the reference E. coli 
genome). The average coverage of the control samples was 
98.6%, which is not significantly different from those 
treated with DSN (99.1%) or Hyb (99.2%), suggesting 
that the sequencing depths in this study were sufficient 
(Supplementary Table SI). The resultant expression 
profile of E. coli is shown in Supplementary Table S2, 
and the top 10 highly expressed genes in each growth 
and rRNA treatment conditions are summarized in 
Supplementary Table S3. 

The composition of major RNA types, namely rRNA, 
tRNA, mRNA and other RNA (miscRNA), was similar 
to that of prokaryotes (7). 

rRNA removal efficiency 

Illumina deep sequencing of the total RNA in E. coli 
revealed that rRNA was indeed the major component in 
all four growth conditions, ranging from 93.1% to 94.5%, 
whereas a substantially smaller proportion (2.9-3.8%) was 
identified as mRNA (Table 1). After the rRNA removal 
treatments, the proportion of rRNA in the cDNA libraries 
was reduced to 13.3-38.6% using DSN and 60.4-79.5% 
using Hyb (Figure 1). The mapped mRNAs increased 
17.3-fold using DSN and 6.5-fold using Hyb compared 
to the untreated controls. The difference in the efficiency 
of rRNA removal treatments was statistically significant 
(P = 0.00007). The rRNA removal efficiency of DSN was 
2.5 times higher than Hyb. However, the ratio of 
unmapped reads was higher in samples that underwent 
Hyb (0.86%) than those treated with DSN (0.39%) or 
than the control (0.50%). The efficiency of rRNA 
removal in DSN method increased in the order of rRNA 
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The statistics of untreated control samples were obtained from data combined from two lanes. 



size (23 S> 16S>5S), but the ratio of 5S:16S:23S was 
pretty conserved in DSN (0.1:35:65) compared to control 
(0.02:32:68), while the ratio was significantly shifted in 
Hyb (0.4:9:91). 

RNA-seq detection threshold 

To improve the accuracy of the RNA-seq analysis, the 
mRNA detection threshold was determined to exclude 
Illumina reads in which the expression level was severely 
affected by measurement errors. It was assumed that the 
biological duplicates, namely controls 1 and 2 under the 
same conditions, had unknown identical means and that 
the measurement errors caused the differences in the 
Illumina read counts. To determine the detection thresh- 
old value, read count 1 was used as the one-tailed upper 
limit of minimum read at the 0.001 significance level. 
Because the variance of the measurement decreased with 
the level of the unknown true expression level 
(Supplementary Figure S3a), the area in which the 
variance showed normal distribution was determined 
first (Supplementary Figure S3b) to minimize the depend- 
ence of the variance on expression values. For the selected 
area, the variance was estimated, and the detection thresh- 
old of an mRNA was determined to be six Illumina reads 
in our library size-normalized data at the 0.001 signifi- 
cance level. Consequently, 94.58% of mRNAs that had 
more than six mapped reads in all samples that passed 
the filtering. The 234 failed ORFs (5.42%) were omitted 
from further statistical analyses. By removing the insignifi- 
cant low read-count genes, the linearity of regression 
between the two duplicate controls was improved 
(Supplementary Figure S3c and d). 

Robustness of mRNA relative abundance 

The correlation of mRNA expression patterns between 
control samples and the two rRNA removal treatments 
is summarized in Figure 2a and Supplementary Figure 
S4. The average slope of the linear regression line 
between the controls and DSN-treated samples was 



0.99 (r = 0.99, Pearson's correlation coefficient), whereas 
the corresponding value between the controls and 
Hyb-treated samples was significantly lower (0.75 on 
average, = 0.93). Similarly, the Lowess fit of 
DSN-treated samples converged to linear regression, 
whereas Hyb-treated samples showed a departure from 
linearity, with a skewed shape in the area of the low to 
middle 'expressers'. Hierarchical clustering analysis of all 
samples using the Pearson's product-moment correlation 
coefficient indicated that the mRNA expression profiles of 
the untreated controls were more similar to that of 
DSN-treated samples than that of Hyb-treated samples 
(Figure 2b). 

Robustness of mRNA profiles 

The fold-change in expression levels of each mRNA was 
calculated by dividing the RPKM of stationary (St), heat 
shock (He) or anaerobic (An) samples by the RPKM of 
the exponential (Ex) sample and these fold-change values 
were represented as St/Ex, He/Ex or An/Ex, respectively. 
The fold difference between the control and the two rRNA 
removal treatments was calculated by subtracting the 
log-scale fold value of the corresponding control sample 
from that of DSN- or Hyb-treated sample. The regression 
analyses (Supplementary Figure S5) and boxplots 
(Figure 3) indicated a smaller fold difference in samples 
treated with DSN than Hyb. The slopes of the linear re- 
gression lines of DSN- and Hyb-treated samples were 0.96 
(r = 0.98) and 0.74 (r = 0.90), respectively. The Lowess fit 
analysis of Hyb-treated samples also showed a departure 
from linearity, whereas the fit of DSN-treated samples 
converged to linear regression. The boxplots (Figure 3) 
demonstrated no fold differences in DSN-treated 
samples, with average values close to 0, whereas the cor- 
responding values of Hyb-treated samples ranged between 
0.03 and 0.10, implying significant expression level differ- 
ences caused by the Hyb treatment. 

The transcripts for which relative abundance 
was severely biased using Hyb treatment were identified 
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Figure 1. rRNA removal efficiency of DSN and Hyb methods. The proportion of Illumina reads assigned to mRNA or rRNA indicates that DSN is 
more effective at rRNA removal than Hyb. 
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Figure 2. Comparison of robustness of mRNA relative abundance using DSN and Hyb. (a) Correlation of relative gene expression between the 
control and rRNA removal treatments. The exponential phase libraries (Ex) are shown as an example. Left, control (Ex-C) versus DSN (Ex-D); 
right, control (Ex-C) versus Hyb (Ex-H). The points in the plot indicate the RPKM value of each individual mRNA transcript. Red indicates the 
linear regression line, and blue indicates non-linear regression (Lowess) fit. Green is a straight line with a slope of 1. (b) Hierarchical clustering of all 
samples tested in this study using UPGMA based on Pearson's correlation. The regression plot and dendrogram represent the superior conservation 
of mRNA relative abundance in DSN treatment compared to Hyb treatment. 
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Figure 2. Continued. 



using a threshold of 0.5 at Alog (fold value) because the 
majority of transcripts of DSN-treated samples 
were within the threshold (Figure 3). A total of 127 
ORFs were identified from Hyb-treated samples 
(Supplementary Table S4), and their functional 
categories are provided in Supplementary Figure S6. 
No correlation between the functional categories of the 
transcripts and bias was detected. No selective loss or 



gain of mRNAs depending on GC content was either 
observed in this data set. 

DISCUSSION 

Given the prevalence of rRNA in the total transcriptome 
of prokaryotes, it is essential to enrich mRNA prior to 
sequencing-based genome-wide gene expression studies. 
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Figure 3. Boxplots showing fold difference profiles. The fold-change of each mRNA was calculated by dividing the RPKM of St, He or An samples 
by the RPKM of the Ex sample and is represented as St/Ex, He/Ex or An/Ex, respectively. The fold difference profile caused by rRNA removal 
treatment was calculated by subtracting the log-scale fold-change of the corresponding control sample from the DSN- or Hyb-treated samples. The 
average and standard deviation are shown at the top of the plots. No fold difference was observed using DSN, with the average values close to 0, 
whereas the corresponding values of Hyb ranged between 0.10 and 0.03, implying a significant expression level difference caused by the Hyb 
treatment. 



However, it is equally important to preserve the overall 
gene expression patterns while enriching the mRNA. In 
this study, we evaluated two commercially available 
methods for performing such a task using E. coli, the 
most widely used model bacterium. 

In our study, both the Hyb and DSN methods removed 
a substantial proportion of rRNA species, with DSN 
being 2.5-fold more efficient than Hyb. Our results 
suggest that researchers can reduce the sequencing cost 
of RNA-seq by 2.5-fold by using DSN method, 
compared to the commercial Hyb kit. The rRNA 
removal efficiency of Hyb method obtained in this study 
was comparable with the previous studies. It has been 
known that the rRNA removal efficiency of Hyb 
method varies widely for community RNA samples 
(17,18), as well as for single-species analyses (19). For 
example, the amount of rRNA remained after the com- 
mercial Hyb treatment ranged from 43.6% to 98.6% de- 
pending on microbial species (7). This is because the Hyb 
method is based on Hyb between rRNA and oligonucleo- 
tides that target conserved regions of bacterial rRNA; 
therefore, the removal efficiency is largely dependent on 
the selected oligonucleotide sequences. In contrast, the 



DSN method does not depend on particular rRNA se- 
quences; therefore, in theory, it can be used for any 
organism, including archaea. However, the variation of 
rRNA removal efficiency between samples was also 
observed in DSN treatment, but the reason is unclear at 
this stage. 

The regression analyses demonstrated the lower robust- 
ness of the Hyb method compared to DSN, especially in 
the low- to middle-expression range (approximate 
RPKM < 300). Many transcripts in this range seemed to 
be expressed at higher levels than those in the untreated 
controls. The depletion of high expressers may result in 
the relative overrepresentation of lower expressers. A 
number of severely biased transcripts were found in 
samples that underwent Hyb treatment, although no cor- 
relation between the functional categories of the tran- 
scripts and bias was found. Undesired binding between 
mRNAs and the capture oligonucleotides may cause this 
phenomenon. Indeed, in the case of ribosomal proteins 
(rps, rpl and rpm) that have obvious homology with 
rRNAs, the average RPKM value was reduced to 76% 
of the untreated control after Hyb treatment, whereas it 
was conserved at 105% in DSN-treated samples. An 
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increased proportion of unmapped reads after Hyb treat- 
ment compared to the control or DSN treatment was also 
noted, although the reason for this is unclear. 

Because of the function of the DSN enzyme, which pref- 
erentially degrades highly abundant transcripts over tran- 
scripts of low abundance, we expected that abundant 
transcripts could be affected by DSN treatment. Because 
the absolute levels of gene expression in bacteria vary over 
as much as six orders of magnitude (20-22), the concen- 
tration of highly abundant mRNA classes could be 
affected by DSN degradation. However, as far as we 
surveyed, even the most abundant mRNA typically 
comprise 1-7% of mRNAs (6,16,23), and as much as 
10% in severe cases (24). Thus, the amount of rRNA in 
a cell significantly outnumbers even the most abundant 
mRNA transcripts. In fact, the 10 most expressed genes 
in each condition showed a little bit declined expression 
level in DSN (7.9% on average) compared to untreated 
control as shown in Supplementary Table S3, while the 
amount of decline was much more severe in Hyb 
(20.3%). Moreover, the log scale regression analysis in 
this study proved that the amount of decline observed in 
the abundant mRNAs did not hamper the overall 
robustness. 

The typical ratio of rRNA:non-rRNA revealed by 
RNA-seq is 95-99% in a wide taxonomic range of pure 
cultured bacteria and archaea (7). Thus, our DSN method 
is, at least, applicable to a broad range of prokaryotes in 
laboratory culturing condition. In addition, it has been 
known that the ratio of rRNA compared to mRNA is 
higher in resting cells than actively growing cells (25,26). 
Thus, we think that the overwhelming abundance of rRNA 
compared to non-rRNA would be the case for even slowly 
growing cells or metabolically inactive cells, though experi- 
mental evidence will be required for DSN applicability for 
these conditions in future. Because the rRNA ratios 
reported in a metatranscriptomic study are also as high 
as 74-97% (6), it is fair to say that DSN method would 
be applicable to mixed population samples. 

Several rRNA removal methods are known, but all of 
these methods have some limitations. The methods based 
on Hyb between rRNA and DNA targeting rRNA may 
generate bias depending on the taxonomy of bacteria. 
These methods include the RNase H digestion method 
based on reverse transcription with rRNA-specific 
primers (4) and the Hyb method evaluated in this study. 
Though recently developed subtractive Hyb method has 
solved the bias by generating customized oligonucleotide 
probes targeting sample specific rRNAs (6), it is not still 
free from non-specific binding of rRNA probe to mRNAs. 
The size selection method using gel electrophoresis (5) has 
an apparent limitation because some mRNAs can 
co-migrate with rRNA and are subsequently omitted. 
The poly(A) tail addition methods (3,27), which use pref- 
erential poly(A) adenylation of mRNA in crude RNA, 
also have the possible limitation of uneven poly(A) 
adenylation efficiency among mRNAs, which may 
generate expression-level bias. Because poly(A) tail 
addition method have not been tested in pure cultures 
using RNA-seq, further evaluation is required. In the 
case of 5'-phosphate-dependent exonuclease digestion of 



rRNA with 5' monophosphate, it has already been 
demonstrated that this method compromises the relative 
proportion of the mRNA population more severely than 
the Hyb method (7). This change in the relative propor- 
tion may occur because exonuclease can also eliminate 
5'-monophosphorylated mRNA species, which are 
produced during mRNA processing by endoribonucleases 
(28,29) and RppH (30). 

The only drawback of DSN over the Hyb method is 
that it requires more experimental steps and, therefore, a 
longer time to prepare cDNA libraries for deep 
sequencing. However, because the rRNA removal step is 
performed after cDNA library construction, the possibil- 
ity of unintentional RNA degradation is greatly reduced 
compared to other methods involving pre-treatments. In 
addition, the amount of total input RNA used for deep 
sequencing library construction was much less in the DSN 
method (200 ng) than in the conventional mRNA-seq 
accompanying poly(A) selection (1-10 ug). Considering 
the loss of RNAs during the mRNA enrichment step 
using other pre-treatments, the amount of total RNA to 
be extracted is even smaller using the DSN method. 

Another advantage of the DSN method is that it works 
well even with partially degraded total RNA, whereas the 
Hyb method requires intact rRNA for the successful 
binding of the oligonucleotides to targeted conserved 
sites. Indeed, a well-supported positive correlation 
between RIN values, which represent the degree of 
rRNA integrity, and rRNA removal efficiency (r = 0.88) 
was observed in our Hyb experiments, as well as in a 
previous report (7); however, this type of correlation was 
not observed in our DNS treatments (r = 0.27). Although 
the number of samples (four total RNAs) and the range of 
RNA degradation (RIN value 8.8-10.0) were not strong 
enough for statistical analyses, the lower importance of 
the RNA fragmentation status for effective DNS treat- 
ment compared to the Hyb method was clear. 

Because the conventional RNA-seq of eukaryotic or- 
ganisms relies on poly(A) + capture in the first step of 
sequencing library preparation, the outcome generally 
contains only information on poly(A)-tailed RNAs. If 
the DSN method used in this study is applied to eukary- 
otic RNA, it probably will provide information on both 
non-poly(A) RNA sequences and poly(A) RNA. Further 
study of the feasibility of this method for eukaryotic 
RNA-seq without poly(A) selection is therefore needed. 

In this study, we performed deep sequencing of total 
RNA of E. coli. Although E. coli is a well studied, widely 
used model organism, its precise genome-wide expression 
profile has not been documented previously. To our know- 
ledge, this is the first report on the intact genome-wide 
RNA profile of E. coli without any treatment and selection. 
This information clearly demonstrates the overall abun- 
dance of different RNA species in a bacterial cell. 

DSN-based normalization showed a higher efficiency of 
rRNA removal than the Hyb method, while preserving the 
relative abundance of mRNA. The thermodynamic prin- 
ciple of this technique allows its application to any kind of 
eukaryotic or prokaryotic organism. Therefore, DSN- 
based mRNA enrichment can be readily used in bacterial 
mRNA-seq experiments. 
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